141 research outputs found

    Leitlinien verantworteter Technik

    Get PDF

    Character Mapping and Ad-hoc Adaptation: Edinburgh's IWSLT 2020 Open Domain Translation System

    Get PDF
    This paper describes the University of Edinburgh’s neural machine translation systems submitted to the IWSLT 2020 open domain Japanese Chinese translation task. On top of commonplace techniques like tokenisation and corpus cleaning, we explore character mapping and unsupervised decoding-time adaptation. Our techniques focus on leveraging the provided data, and we show the positive impact of each technique through the gradual improvement of BLEU

    Bilingual Document Alignment with Latent Semantic Indexing

    Get PDF
    We apply cross-lingual Latent Semantic Indexing to the Bilingual Document Alignment Task at WMT16. Reduced-rank singular value decomposition of a bilingual term-document matrix derived from known English/French page pairs in the training data allows us to map monolingual documents into a joint semantic space. Two variants of cosine similarity between the vectors that place each document into the joint semantic space are combined with a measure of string similarity between corresponding URLs to produce 1:1 alignments of English/French web pages in a variety of domains. The system achieves a recall of ca. 88% if no in-domain data is used for building the latent semantic model, and 93% if such data is included. Analysing the system's errors on the training data, we argue that evaluating aligner performance based on exact URL matches under-estimates their true performance and propose an alternative that is able to account for duplicates and near-duplicates in the underlying data.Comment: Proceedings of the First Conference on Machine Translation (2016), Volume 2: Shared Task Paper

    A Deterministic Dependency Parser for Japanese

    Get PDF
    We present a rule-based, deterministic dependency parser for Japanese. It was implemented in C ++, using object classes that reflect linguistic concepts and thus facilitate the transfer of linguistic intuitions into code. The parser first chunks morphemes into one-word phrases and then parses from the right to the left. The average parsing accuracy is 83.6%

    Thunderstorm nowcasting with deep learning: a multi-hazard data fusion model

    Full text link
    Predictions of thunderstorm-related hazards are needed in several sectors, including first responders, infrastructure management and aviation. To address this need, we present a deep learning model that can be adapted to different hazard types. The model can utilize multiple data sources; we use data from weather radar, lightning detection, satellite visible/infrared imagery, numerical weather prediction and digital elevation models. It can be trained to operate with any combination of these sources, such that predictions can still be provided if one or more of the sources become unavailable. We demonstrate the ability of the model to predict lightning, hail and heavy precipitation probabilistically on a 1 km resolution grid, with a time resolution of 5 min and lead times up to 60 min. Shapley values quantify the importance of the different data sources, showing that the weather radar products are the most important predictors for all three hazard types.Comment: 15 pages, 3 figures. Submitted to Geophysical Research Letter

    The SUMMA Platform:Scalable Understanding of Multilingual Media

    Get PDF
    We present the latest version of the SUMMA platform, an open-source software platform for monitoring and interpreting multi-lingual media, from written news published on the internet to live media broadcasts via satellite or internet streaming.This work was conducted within the scope of the Research and Innovation Action SUMMA, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688139
    • …
    corecore